Prosody and prosodic boundaries carry significant information regarding linguistics and paralinguistics and are\r\nimportant aspects of speech. In the field of prosodic event detection, many local acoustic features have been\r\ninvestigated; however, contextual information has not yet been thoroughly exploited. The most difficult aspect of this\r\nlies in learning the long-distance contextual dependencies effectively and efficiently. To address this problem, we\r\nintroduce the use of an algorithm called auto-context. In this algorithm, a classifier is first trained based on a set of\r\nlocal acoustic features, after which the generated probabilities are used along with the local features as contextual\r\ninformation to train new classifiers. By iteratively using updated probabilities as the contextual information, the\r\nalgorithm can accurately model contextual dependencies and improve classification ability. The advantages of this\r\nmethod include its flexible structure and the ability of capturing contextual relationships. When using the\r\nauto-context algorithm based on support vector machine, we can improve the detection accuracy by about 3% and\r\nF-score by more than 7% on both two-way and four-way pitch accent detections in combination with the acoustic\r\ncontext. For boundary detection, the accuracy improvement is about 1% and the F-score improvement reaches 12%.\r\nThe new algorithm outperforms conditional random fields, especially on boundary d
Loading....